Stochastic Shortest Path Games and Q-Learning

نویسنده

Huizhen Yu

چکیده

We consider a class of two-player zero-sum stochastic games with finite state and compact control spaces, which we call stochastic shortest path (SSP) games. They are total cost stochastic dynamic games that have a cost-free termination state. Based on their close connection to singleplayer SSP problems, we introduce model conditions that characterize a general subclass of these games that have strong properties: the value function exists and is the unique solution of the Bellman equation, both players have optimal policies that are stationary deterministic, and the value iteration algorithm, as well as the policy iteration algorithm starting with certain wellbehaved policies, converge. We then consider the classical Q-learning algorithm that computes the value function for finite state and control SSP games that satisfy our model conditions. Q-learning is a model-free, asynchronous stochastic iterative algorithm, and by the theory of stochastic approximation involving monotone nonexpansive mappings, it is known to converge when the Bellman equation has a unique solution and its iterates are bounded with probability one. We prove the boundedness of the Q-learning iterates and thereby establish completely the convergence of Q-learning for our broad class of SSP game models. Dec 2011

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning for Average Reward Zero-Sum Games

We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Differential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparab...

متن کامل

An Online Convergent Q-learning Algorithm with Linear Function Approximation

We present in this article a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm is convergent. Numerical results on a multi-stage stochastic shortest path problem show t...

متن کامل

Q-learning and policy iteration algorithms for stochastic shortest path problems

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in Bertsekas and Yu (Math. Oper. Res. 37(1...

متن کامل

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

متن کامل

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Stochastic Shortest Path Games and Q-Learning

نویسنده

چکیده

منابع مشابه

Reinforcement Learning for Average Reward Zero-Sum Games

An Online Convergent Q-learning Algorithm with Linear Function Approximation

Q-learning and policy iteration algorithms for stochastic shortest path problems

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

عنوان ژورنال:

اشتراک گذاری